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Sy stem and Method for Automated Experience Rating and/or Lo ss 

Reserving 

The invention relates to a system and a method for automated 
experience rating and/or loss reserving, a certain event Pif of an initial time 
5 interval i with f=1,...,Fi for a sequence of development intervals k=1,....K 
including development values Pikf. For the events Pif of the first initial time 

interval i=1, all development values Pikf f=1 Fi are known. The invention 

relates particularly to a computer program product for carrying out this method. 

Experience rating relates in the prior art to value developments of 
10 parameters of events which take place for the first time in a certain year, the 
incidence year or initial year, and the consequences of which propagate over 
several years, the so-called development years. Expressed more generally, the 
events take place at a certain point in time, and develop at given time intervals. 
Furthermore, the event values of the same event demonstrate over the different 
15 development years or development time intervals a dependent, retrospective 
development. The experience rating of the values takes place through 
extrapolation and/or comparison with the value development of known similar 
events in the past. 

A typical example in the prior art is the several years' experience 
20 rating based upon damage events, e.g.. of the payment status Z or the reserve 
status R of a damage event at insurance companies or reinsurers. In the 
experience rating of damage events, an insurance company knows the 
development of every single damage event from the time of the advice of 
damage up to the current status or until adjustment. In the case of experience 
25 rating, the establishment of the classic credibility formula through a stochastic 
model dates from about 30 years ago; since then, numerous variants of the 
model have been developed, so that today an actual credibility theory may be 
spoken of. The chief problem in the application of credibility formulae consists 
of the unknown parameters which are determined by the structure of the 
30 portfolio. As an alternative to known methods of estimation, a game-theory 
approach is also offered in the prior art, for instance: the actuary or insurance 
statistician knows bounds for the parameter, and determines the optimal 
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premium for the least favorable case. The credibility theory also comprises a 
number of models for reserving for long-term effects. Included are a variety of 
reserving methods which, unlike the credibility formula, do not depend upon 
unknown parameters. Here, too, the prior art comprises methods by stochastic 
5 models which describe the generation of the data. A series of results exist 
above all for the chain-ladder method as one of the best known methods for 
calculating outstanding payment claims and/or for extrapolation of the damage 
events. The strong points of the chain-ladder method are its simplicity, on the 
one hand, and, on the other hand, that the method is neariy distribution-free, 
10 i.e., the method is based on almost no assumptions. Distribution-free or non- 
parametric methods are particularly suited to cases in which the user can give 
insufficient details or no details at all concerning the distribution to be expected 
(e.g., Gaussian distribution, etc.) of the parameter to be developed. 

The chain-ladder method means that of an event or loss Pjf with f=1, 
15 2, ...,Fi from incidence year i=1,...,l, values Pikf are known, wherein Pikt may be, 
e.g., the payment status or the reserve status at the end of each handling year 
k=1,...,K. Therefore, an event Pjf consists in this case in a sequence of dots 

Pif = (Pl1f, P|2f PiKf) 

of which the first K+1-1 dots are known, and the yet unknown dots 
20 (Pi.K+2-i.f. ... . Pi.K.f) are to be predicted. The values of the events Pif form a so- 
called loss triangle or. more generally, an event-values triangle 
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The lines and columns are formed by the damage-incidence years 
and the handling years. Generally speaking, e.g.. the lines show the initial 
25 years, and the columns show the development years of the examined events, it 
also being possible for the presentation to be different from that. Now. the 
chain-ladder method is based upon the cumulated loss triangles, the entries Cjj 



of which are, e.g., either mere loss payments or loss expenditures (loss 
payments plus change in the loss reserves). Valid for the cumulated array 
elements Cy is 



from which follows 
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From the cumulated values interpolated by means of the chain- 
ladder method, the individual event can also again be judged in that a certain 
distribution, e.g., typically a Pareto distribution, of the values is assumed. The 
Pareto distribution is particularly suited to insurance types such as, e.g., 
insurance of major losses or reinsurers, etc. The Pareto distribution takes the 
following form 



wherein T is a threshold value, and a is the fit parameter. The 
15 simplicity of the chain-ladder method resides especially in the fact that for 
application it needs no more than the above loss triangle (cumulated via the 
development values of the individual events) and, e.g., no information 
concerning reporting dates, reserving procedures, or assumptions concerning 
possible distributions of loss amounts, etc. The drawbacks of the chain-ladder 
20 method are sufficiently known in the prior art (see, e.g., Thomas Mack, 
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Measuring the Variability of Chain Ladder Reserve Estimates, submitted CAS 
Prize Paper Competition 1993, Greg Taylor, Chain Ladder Bias, Centre for 
Actuarial Studies, University of Melbourne. Australia, March 2001. pp 3). In 
order to obtain a good estimate value, a sufficient data history is necessary. In 
5 particular, the chain-ladder method proves successful In classes of business 
such as motor vehicle liability insurance, for example, where the differences in 
the loss years are attributable in great part to differences in the loss frequencies 
since the appraisers of the chain-ladder method correspond to the maximum 
likelihood estimators of a model by means of modified Poisson distribution. 

10 Hence caution is advisable, e.g.. in the case of years in which changes in the 
loss amount distribution are made (e.g., an increase in the maximum liability 
sum or changes in the retention) since these changes may lead to structural 
failures in the chain-ladder method. In classes of business having extremely 
long run-off time-such as general liability insurance~the use of the chain- 

15 ladder method likewise leads in many cases to usable results although data, 
such as a reliable estimate of the final loss quota, for example, are seldom 
available on account of the long run-off time. However, the main drawback of 
the chain-ladder method resides in the fact that the chain-ladder method is 
based upon the cumulated loss triangle, i.e., through the cumulation of the 

20 event values of the events having the same initial year, essential information 
concerning the individual losses and/or events is lost and can no longer be 
recovered later on. 

Known in the prior art is a method of T. Mack (Thomas Mack, 
Schriftreihe Angewandte Versicherungsmathematil<, booklet 28, pp. 31 Off., 

25 Veriag Versicherungswirtschaft E.V., Karisruhe 1997) in which the values can 
be propagated, i.e., the values in the loss triangle can be extrapolated without 
loss of the information on the individual events. With the Mack method, 
therefore, using the complete numerical basis for each loss, an individual 
IBNER reserve can be calculated (IBNER: Incurred But Not Enough Reported). 

30 IBNER demands are understood to mean payment demands which are either 
over the predicted values or are still outstanding. The IBNER reserve is useful 
especially for experience rating of excess of loss reinsurance contracts, where 
the reinsurer, as a rule, receives the required individual loss data, at least for 
the relevant major losses. In the case of the reinsurer, the temporal 
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development of a portfolio of risks describes through a risk process in which the 
damage figures and loss amounts are modeled, whereby in the excess of loss 
reinsurance, upon the transition from the original insurer to the reinsurer, the 
phenomenon of the accidental dilution of the risk process arises; on the other 
5 hand, through reinsurance, portfolios of several original insurers are combined 
and risk processes thus caused to overiap. The effects of dilution and 
overlapping have, until now, been examined above all for Poisson risk 
processes. For insurance/reinsurance, experience rating by means of the Mack 
method means that of each loss Pif, with f=1 ,2,...,F| from incidence year or initial 
10 year i=1 .....I, the payment status Zm and the reserve status Rikf at the end of 
each handling year or development year k=1 .....K until the current status (Zi.K+i. 
i.f, Ri.K+i-i,f) is known. A loss Pit in this case therefore consists of a sequence of 
dots 

Pif = (Zjif. Riif). (Zi2f. Ri2f). (ZlKf. RiKf) 

15 at the payment reserve level, of which the first K+1-i dots are known, 

and the still unknown dots (Zi.K+2-i.f, Ri.K+2-i.f), (Zj.K.f, Ri.K.f) are supposed to be 
predicted. Of particular interest is, naturally, the final status (Zi.K.f, Ri,K.f). Ri.K.f 
being equal to 0 in the ideal case, i.e.. the claim is regarded as completely 
settled; whether this can be achieved depends upon the length K of the 

20 development period considered. In the prior art, as e.g. in the Mack method, a 
claim status (Zi,K+i-i.f, Ri.K+i-i.f) is continued as was the case in similar claims 
from eariier incidence years. In the conventional methods, therefore, it must be 
determined, for one thing, when two claims are "similar," and for another thing, 
what it means to "continue" a claim. Furthermore, besides the IBNER reserve 

25 thus resulting, it must be determined, in a second step, how the genuine 

belated claims are to be calculated, about which nothing is as yet known at the 
present time. 

For qualifying the similarity, e.g.. the Euclidean distance 



d((Z,/?). (Z,^)) = ^l{Z-ZY^{R-Rf 
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is used at the payment reserve level in the prior art. But also with 
the Euclidean distance there are many possibilities for finding for a given claim 
(Pi.i.f, Pi.2.f. .... Pi.K+i.i.f) the closest most similar claim of an earlier incidence year, 
i.e., the claim --Pi. ..'--Pk) with k>K+1-i. for which either 

5 (^^"^ ^" previous distances) 

or 

K+\-i 

• diPfj^,Pj) (weighted sum of all distances) 
or 

max d(R ^,P.) (maximum distance) 

1< J<K+\-i ^Jf J 

10 or 

d{P,^^,,^,jA.x-j) (current distance) 

is minimal. 

In the example of the Mack method, normally the current distance is 
used. This means that for a claim (Pi,...,Pk), the handling of which is known up 
15 to the k-th development year, of all other claims (^i,..., ^j), the development of 
which is known at least up to the development year j > k + 1, the one 
considered as the most similar is the one for which the current distance 
d(^*,^*) is smallest. 

The claim (Pi....,Pk) is now continued as is the case for its closest- 

20 distance "moder(^> ^*.^*+»...m ^0- Por doing this, there is the possibility of 

continuing for a single handling year (i.e., up to Pk+i) or for several development 
years at the same time (e.g., up to Pj). In methods such as the Mack method, 
for instance, one typically first continues for just one handling year in order to 
search then again for a new most similar claim, whereby the claim just 
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continued is continued for a further development year. The next claim found 
may naturally also again be the same one. For continuation of the damage 
claims, there are two possibilities. The additive continuation of Pr = (Zk.Rk) 

5 and the multiplicative continuation of Pk = (Zk.Rk) 

Z R 

It is easy to see that one of the drawbacks of the prior art, especially 
of the Mack method, resides, among other things, in the type of continuation of 
the damage claims. The multiplicative continuation is useful only for so-called 

10 open claim statuses, i.e.. Zr > 0, Rk > 0. In the case of probable claim statuses 
Pk = (0, Rk), Rk > 0, the multiplicative continuation must be diversified since 
otherwise no continuation takes place. Moreover if Z^^ =0 or R^^ = 0. a division 
by 0 takes place. Similarly, if or R^^ is small, the multiplicative method may 
easily lead to unrealistically high continuations. This does not permit a 

15 consistent treatment of the cases. This means that the reserve Rk cannot be 
simply continued in this case. In the same way, an adjusted claim status Pk = 
(Zk, 0), Zk > 0 can likewise not be further developed. One possibility is simply to 
leave It unchanged. However, a revival of a claim is thereby prevented. At best 
it could be continued on the basis of the closest adjusted model, which likewise 

20 does not permit a consistent treatment of the cases. Also with the additive 
continuation, probable claim statuses should meaningfully be continued only on 
the basis of a likewise probable model In order to minimize the Euclidean 
distance and to guarantee a corresponding qualification of the similarity. An 
analogous drawback arises in the case of adjusted claim statuses, if a revival is 

25 supposed to be allowed and negative reserves are supposed to be avoided. 
Quite generally, the additive method can easily lead to negative payments 
and/or reserves. In addition, in the prior art, a claim Pk cannot be continued if 
no corresponding model exists without further assumptions being inserted into 
the method. As an example thereof is an open claim Pk when in the same 

30 handling year k there is no claim from previous incidence years in which is 
likewise open. A way out of the dilemma can be found in that, for this case, Pk 
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is left unchanged, i.e. P^^^ = P^, which of course does not correspond to any true 
continuation. 

Thus, all in all, in the prior art every current claim status Pi,K+i-i.f = 
(Zi,K+i-i.f, Ri.K+1-i.f) is further developed step by step either additively or 

5 multiplicatively up to the end of development and/or handling after K- 
development years. Here, in each step, the nearest, according to the 
Euclidean distance in each case, model claim status of the same claim status 
type (probable, open, or adjusted) is ascertained, and the claim status to be 
continued is continued either additively or multiplicatively according to the 

!0 further development of the model claim. For the Mack method, it is likewise 
sensible always to take into consideration as model only actually observed 
claim developments -> P^+, and no extrapolated, i.e., developed claim 
developments since otherwise a correlation and/or a corresponding bias of the 
events is not to be avoided. Conversely, however, the drawback is maintained 

15 that already known information of events is lost. 

From the construction of the prior art methods it is immediately clear 
that the methods can also be applied separately, on the one hand to the 
triangle of payments, on the other hand to the triangle of reserves. Naturally, 
with the way of proceeding described, other possibilities could also be permitted 

20 in order to find the closest claim status as model in each case. However, this 
would have an effect particularly on the distribution freedom of the method. It 
may thereby be said that in the prior art, the above-mentioned systematic 
problems cannot be eliminated even by respective modifications, or at best only 
in that further model assumptions are inserted into the method. Precisely in the 

25 case of complex dynamically non-linear processes, however, as e.g. the 
development of damage claims, this is not desirable in most cases. Even 
putting aside the mentioned drawbacks, it must still always be determined, in 
the conventional method according to T, Mack, when two claims are similar and 
what it means to continue a claim, whereby, therefore, minimum basic 

30 assumptions and/or model assumptions must be made. In the prior art, 
however, not only Is the choice of Euclidean metrics arbitrary, but also the 
choice between the mentioned multiplicative and additive methods. 
Furthermore, the estimation of error is not defined in detail in the prior art. It is 
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true that it is conceivable to define an error, e.g., based on the inverse distance. 
However, this is not disclosed in the prior art. An important drawback of the 
prior art is also, however, that each event must be compared with all the 
previous ones in order to be able to be continued. The expenditure increases 
5 linearly with the number of years and linearly with the number of claims in the 
portfolio. When portfolios are aggregated, the computing effort and the 
memory requirement increase accordingly. 

Neural networks are fundamentally known in the prior art, and are 
used, for instance, for solving optimization problems, image recognition (pattern 

10 recognition), in artificial intelligence, etc. Corresponding to biological nerve 
networks, a neural network consists of a plurality of network nodes, so-called 
neurons, which are interconnected via weighted connections (synapses). The 
neurons are organized in network layers (layers) and interconnected. The 
individual neurons are activated in dependence upon their input signals and 

15 generate a corresponding output signal. The activation of a neuron takes place 
via an individual weight factor by the summation over the input signals. Such 
neural networks are adaptive by systematically changing the weight factors as a 
function of given exemplary input and output values until the neural network 
shows a desired behavior in a defined, predictable error span, such as the 

20 prediction of output values for future input values, for example. Neural 
networks thereby exhibit adaptive capabilities for learning and storing 
knowledge and associative capabilities for the comparison of new information 
with stored knowledge. The neurons (network nodes) may assume a resting 
state or an excitation state. Each neuron has a plurality of inputs and just one 

25 output which is connected in the inputs of other neurons of the following 
network layer or, in the case of an output node, represents a corresponding 
output value. A neuron enters the excitation state when a sufficient number of 
the inputs of the neuron are excited over a certain threshold value of the 
neuron, i.e., if the summation over the inputs reaches a certain threshold value. 

30 In the weights of the inputs of a neuron and in the threshold value of the 
neuron, the knowledge is stored through adaptation. The weights of a neural 
network are trained by means of a learning process (see, e.g., G. Cybenko, 
"Approximation by Superpositions of a sigmoidal function," Math. Control, Sig. 
Syst, 2. 1989. pp. 303-314; M. T. Hagan, M. B. Menjaj. "Training Feed-fonA^ard 



10 



Networks with the Marquardt Algorithm/' IEEE Transactions on Neural 
Networks, Vol. 5, No, 6, pp. 989-993, November 1994; K. Hornik, M. 
Stinchcombe, H. White, "Multilayer Feed-forward Networks are Universal 
Approximators," Neural Networks, 2, 1989, pp. 359-366, etc.). 

5 It is a task of this invention to propose a new system and method for 

automated experience rating of events and/or loss reserving which does not 
exhibit the above-mentioned drawbacks of the prior art. In particular, an 
automated, simple, and rational method shall be proposed in order to develop a 
given claim further with an individual increase and/or factor so that 

10 subsequently all the information concerning the development of a single claim 
is available. With the method, as few assumptions as possible shall be made 
from the outset concerning the distribution, and at the same time the maximum 
possible information on the given cases shall be exploited. 

According to the present invention, this goal is achieved in particular 
15 by means of the elements of the independent claims. Further advantageous 
embodiments follow moreover from the dependent claims and the description. 

In particular, these goals are achieved by the invention in that 
development values Pi.k.f having development intervals k=1,...,K are assigned to 
a certain event Pi.f of an initial time interval i. wherein K is the last known 

20 development interval is, with i=1 K, and for the events Pi,f all development 

values Pikf are known, at least one neural network being used for determining 
the development values Pi.K+2-i.f, ... PiKf. In the case of certain events, e.g., the 
initial time interval can be assigned to an initial year, and the development 
intervals can be assigned to development years. The development values Pjkt 

25 of the various events Pjj can, according to their initial time interval, be scaled by 
means of at least one scaling factor. The scaling of the development values Pikf 
has the advantage, among others, that the development values are comparable 
at differing points in time. This variant embodiment further has the advantage, 
among others, that for the automated experience rating no model assumptions 

30 need be presupposed, e.g. concerning value distributions, system dynamics, 
etc. In particular the experience rating is free of proximation preconditions, 
such as the Euclidean measure, etc.. for example. This is not possible in this 
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way in the prior art. In addition, the entire information of the data sample is 
used, without the data records' being cumulated. The complete information 
concerning the individual events is kept in each step, and can be called up 
again at the end. The scaling has the advantage that data records of differing 
5 initial time intervals receive comparable orders of magnitude, and can thus be 
better compared. 

In one variant embodiment, for determining the development values 
Pi,K-(j-j)+i,f (i-1) neural networks N|j are generated iteratively with j=1,...,(i-1) for 
each initial time interval and/or initial year i, the neural network Nij+i depending 

10 recursively on the neural network Njj. For weighting a certain neural network 

Njj, the development values Pp.q.f can be used, for example, with p=1 (i-1) 

and q=1 ....,K-(i-j). This variant embodiment has the advantage, among others, 
that, as in the preceding variant embodiment, the entire information of the data 
sample is used, without the data records' being cumulated. The complete 

15 information concerning the individual events is maintained in each step, and 
can be called up again at the end. By means of a minimizing of a globally 
introduced error, the networks can be additionally optimized. 

In another variant embodiment, the neural networks Nij are 
identically trained for identical development years and/or development intervals 

20 j, the neural network Ni+i.pi being generated for an initial time interval and/or 
initial year i+1 . and all other neural networks Nj+i j<i being taken over from 
previous initial time intervals and/or initial years. This variant embodiment has 
the advantage, among others, that only known data are used for the experience 
rating, and certain data are not used further by the system, whereby the 

25 correlation of the errors or respectively of the data is prevented. 

In a still different variant embodiment, events Pi.f with initial time 
interval i<1 are additionally used for determination, all development values 
Pj<i.k.f for the events Pkij being known. This variant embodiment has the 
advantage, among others, that by means of the additional data records the 
30 neural networks can be better optimized, and their errors can be minimized. 
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In a further variant embodiment, for the automated experience rating 
and/or loss reserving, development values Pj.k.f with development intervals 
k=1....,K are stored assigned to a certain event Pj.f of an initial time interval i, in 
which i = 1,...,K. and K is the last known development interval, and in which for 

5 the first initial time interval all development values Pi.m are known, for each 
initial time interval i=2.....K by means of iterations j=1....(i-1) upon each iteration 
j in a first step a neural network being generated having an input layer with 
K-(i-j) input segments and an output layer, which input segments comprise at 
least one input neuron and are assigned to a development value Pi,k.f, in a 

10 second step the neural network Njj with the available events Pj.f of all initial 
time intervals m=1,....,(i-1) being weighted by means of the development values 
Pm.i..K-(i.j).f as input and Pm.i .K-(H)+i.f as output, and in a third step by means of 
the neural network Nij the output values Oi.f being determined for all events Pj.f 
of the initial time interval i. the output value Oj.f being assigned to the 

15 development value Pi.K-(i.j)+i.f of the event Pj.f, and the neural network Nsj being 
dependent recursively on the neural network Nij+i. In the case of certain 
events, e.g., the initial time interval can be assigned to an initial year, and the 
development intervals assigned to development years. This variant 
embodiment has the same advantages, among others, as the preceding variant 

20 embodiments. 

In one variant embodiment, a system comprises neural networks Ni 
each having an input layer with at least one input segment and an output layer, 
which input and output layer comprises a plurality of neurons which are 
interconnected in a weighted way. the neural networks Ni being iteratively 

25 producible by means of a data processing unit through software and/or 

hardware, a neural network Nj+i depending recursively on the neural network 
Ni, and each network Nki comprising in each case one input segment more 
than the network Nj, each neural network Nj, beginning with the neural network 
Ni, being trainable by means of a minimization module through minimizing of a 

30 locally propagated error, and the recursive system of neural networks being 
trainable by means of a minimization module through minimization of a globally 
propagated error based upon the local errors of the neural networks Nj. This 
variant embodiment has the advantage, among others, that the recursively 
generated neural networks can be additionally optimized by means of the global 
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error Among other things, it is the combination of the recursive generation of 
the neural network structure with a double minimization by means of locally 
propagated error and globally propagated error which results in the advantages 
of the variant embodiment. 

5 In another variant embodiment, the output layer of the neural 

network Ni is connected in an assigned way to at least one input segment of the 
input layer of the neural network Nkl This variant embodiment has the 
advantage, among others, that the system of neural networks can in turn be 
interpreted as a neural network . Thus partial networks of a whole network may 

10 be locally weighted, and also in the case of global learning can be checked and 
monitored in their behavior by the system by means of the corresponding data 
records. This has not been possible until now in this way in the prior art. 

At this point, it shall be stated that besides the method according to 
the invention, the present invention also relates to a system for carrying out this 
15 method. Furthermore, it is not limited to the said system and method, but 
equally relates to recursively nested systems of neural networks and a 
computer program product for implementing the method according to the 
invention. 

Variant embodiments of the present invention are described below 
20 on the basis of examples. The examples of the embodiments are illustrated by 
the following accompanying figures: 

Figure 1 shows a block diagram which reproduces schematically the 
training and/or determination phase or presentation phase of a neural network 
for determining the event value P2.5.f of an event Pf in an upper 5x5 matrix, i.e., 
25 with K=5. The dashed line T indicates the training phase, and the solid line R 
the determination phase after learning. 

Figure 2 likewise shows a block diagram which, like Figure 1, 
reproduces schematically the training and/or determination phase of a neural 
network for determining the event value P3.4.f for the third initial year. 
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Figure 3 shows a block diagram which, like Figure 1, reproduces 
schematically the training and/or determination phase of a neural network for 
determining the event value Pa.s.f for the third initial year. 

Figure 4 shows a block diagram which schematically shows only the 
5 training phase for determining P3,4.f and Pa.s.f, the calculated values P3,4,f being 
used for training the network for determining P3.5.f. 

Figure 5 shows a block diagram which schematically shows the 
recursive generation of neural networks for determining the values in line 3 of a 
5x5 matrix, two networks being generated. 

10 Figure 6 shows a block diagram which schematically shows the 

recursive generation of neural networks for determining the values in line 5 of a 
5x5 matrix, four networks being generated. 

Figure 7 shows a block diagram which likewise shows schematically 
a system according to the invention, the training basis being restricted to the 
15 known event values Ay. 

Figures 1 to 7 illustrate schematically an architecture which may be 
used for implementing the invention. In this embodiment example, a certain 
event P\j of an initial year i includes development values Pikf for the automated 
experience rating of events and/or loss reserving. The index f runs over all 

20 events Pi,f for a certain initial year i with f = 1,...,Fj. The development value Pikf 
= (Zikf.Rikf, ..) is any vector and/or n-tuple of development parameters Zm, Rjkf, 
... , which is supposed to be developed for an event. Thus, for example, in the 
case of insurance for a damage event Pikf. Zikf can be the payment status, Rjkf 
the reserve status, etc. Any desired further relevant parameters for an event 

25 are conceivable without this affecting the scope of protection of the invention. 
The development years k proceed from k=1.....K, and the initial years I = 1,....!. 
K is the last known development year. For the first initial year i = 1 . all 
development values Puf are given. As already indicated, for this example the 
number of initial years I and the number of development years K are supposed 

30 to be the same, i.e.. I = K. However, it is quite conceivable that I * K, without 
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the method or the system being thereby limited. Pikf is therefore an n-tuple 
consisting of the sequence of dots and/or matrix elements 



(Zikn. R 



ikrii 



.) withk = 1,2, K 



With I = K the result is thereby a quadratic upper triangular matrix 
and/or block triangular matrix for the known development values Pikf 



i/=l..F, 



^2l/=l..F2 
P 



p 

^22/=!.. F2 
P32/=1..F3 
^42/=! .F4 



^13/=1..F, 
P 

^33/=l..F, 



^24/=!.. A 



again with f=1 Fj going over all events for a certain initial year. 

Thus, the lines of the matrix are assigned to the initial years and the columns of 
the matrix to the development years. In the embodiment example, Pikf shall be 

10 limited to the example of damage events with insurance since in particular the 
method and/or the system is very suitable, e.g.. for the experience rating of 
insurance contracts and/or excess loss reinsurance contracts. It must be 
emphasized that the matrix elements Pikf may themselves again be vectors 
and/or matrices, whereupon the above matrix becomes a corresponding block 

15 matrix. The method and system according to the invention is, however, suitable 
for experience rating and/or for extrapolation of time-delayed non-linear 
processes quite generally. That being said, Pikf is a sequence of dots 



(Zikn. Rikn. ...) withk = 1,2. .... K 



20 



at the payment reserve level, the first K+1-i dots of which are known, 
and the still unknown dots (Zi.K+2-i.f. Ri.K+2-i.f), , (ZjKf, RiKf), are supposed to be 
predicted. If. for this example. Pikf is divided into payment level and reserve 
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level, the result obtained analogously for the payment level is the triangular 
matrix 



(z 


7 

^12/ 


^13/ 


7 Z 




Z22/ 


^23/ 


^24/ 


^31/ 


^32/ 


^33/ 




y 


z 












J 



5 and for the reserve level the triangular matrix 

^Rllf R12/ R\3f ^15/^ 
^2/ ^3/ ^4/ 
^31/ ^2/ ^33/ 
^41/ ^42/ 

Thus, in the experience rating of damage events, the development of 
each individual damage event fi is known from the point in time of the report of 
damage in the initial year i until the current status (current development year k) 
10 or until adjustment. This information may be stored in a database, which 
database may be called up, e.g., via a network by means of a data processing 
unit. However, the database may also be accessible directly via an internal 
data bus of the system according to the invention, or be read out otherwise. 

In order to use the data in the example of the claims, the triangular 
15 matrices are scaled in a first step, i.e., the damage values must first be made 
comparable in relation to the assigned time by means of the respective inflation 
values. The inflation index may likewise be read out of corresponding 
databases or entered in the system by means of input units. The inflation index 
for a country may, for example, look like the following: 



Year 


Inflation Index (%) 


Annual Inflation Value 


1989 


100 


1.000 


1990 


105.042 


1.050 
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1991 


112.920 


1.075 


1992 


121.429 


1.075 


1993 


128.676 


1.060 


1994 


135.496 


1.053 


1995 


142.678 


1.053 


1996 


148.813 


1.043 


1997 


153.277 


1.030 


1998 


157.109 


1.025 


1999 


163.236 


1.039 


2000 


171.398 


1.050 


2001 


177.740 


1.037 


2002 


185.738 


1.045 



Further scaling factors are just as conceivable, sucli as regional 
dependencies, etc., for example. If damage events are compared and/or 
extrapolated in more than one country, respective national dependencies are 
added. For the general, non-insurance-specific case, the scaling may also 
relate to dependencies such as e.g. mean age of populations of living beings, 
influences of nature, etc. etc.. 

For the automated determination of the development values Pi,K+2- 

i.f Pi.K.f = (Zi,K>2-i.f. Ri,K+2-i,f). . - . (Zi.K.f. Ri.K.f). the system and/or method 

comprises at least one neural network . As neural networks, e.g.. conventional 
static and/or dynamic neural networks may be chosen, such as, for example. 
feed-foHA/ard (heteroassociative) networks such as a perceptron or a multi-layer 
perceptron (MLP), but also other network structures, such as, e.g.. recurrent 
network structures, are conceivable. The differing network structure of the 
feed-forward networks in contrast to networks with feedback (recurrent 
networks) determines the way in which information is processed by the network. 
In the case of a static neural network, the structure is supposed to ensure the 
replication of static characteristic fields with sufficient approximation quality. 
For this embodiment example let multilayer perceptrons be chosen as an 
example. An MLP consists of a number of neuron layers having at least one 
input layer and one output layer. The structure is directed strictly fonward, and 
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belongs to the group of feed-forward networks. Neural networks quite generally 
map an m-dimensional input signal onto an n-dimensional output signal. The 
information to be processed is, in the feed-forward network considered here, 
received by a layer having input neurons, the input layer. The input neurons 

5 process the input signals, and forward them via weighted connections, so- 
called synapses, to one or more hidden neuron layers, the hidden layers. From 
the hidden layers, the signal is transmitted, likewise by means of weighted 
synapses, to neurons of an output layer which, in turn, generate the output 
signal of the neural network . In a forward directed, completely connected MLP, 

10 each neuron of a certain layer is connected to all neurons of the following layer. 
The choice of the number of layers and neurons (network nodes) in a particular 
layer is, as usual, to be adapted to the respective problem. The simplest 
possibility is to find out the ideal network structure empirically. In so doing, it is 
to be heeded that if the number of neurons chosen is too large, the network. 

15 instead of learning, works purely image-forming, while with too small a number 
of neurons it comes to correlations of the mapped parameters. Expressed 
differently, the fact is that if the number of neurons chosen is too small, the 
function can possibly not be represented. However, upon increasing the 
number of hidden neurons, the number of independent variables in the error 

20 function also increases. This leads to more local minima and to the greater 
probability of landing in precisely one of these minima. In the special case of 
back propagation, this problem can be at least minimized, e.g. by means of 
simulated annealing. In simulated annealing, a probability is assigned to the 
states of the network. In analogy to the cooling of liquid material from which 

25 crystals are produced, a high initial temperature T is chosen. This is gradually 
reduced, the lower the slower. In analogy to the formation of crystals from 
liquid, it is assumed that if the material is allowed to cool too quickly, the 
molecules do not arrange themselves according to the grid structure. The 
crystal becomes impure and unstable at the locations affected. In order to 

30 present this, the material is allowed to cool down so slowly that the molecules 
still have enough energy to jump out of local minimum. In the case of neural 
networks, nothing different is done: additionally, the magnitude T is introduced 
in a slightly modified error function. In the ideal case, this then converges 
toward a global minimum. 
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For the application to experience rating, neural networks having an 
at least three-layered structure have proved useful in MLP. That means that 
the networks comprise at least one input layer, a hidden layer, and an output 
layer. Within each neuron, the three processing steps of propagation, 
activation, and output take place. As output of the i-th neuron of the k-th layer 
there results 



V J ) 



whereby e.g. for k=2. as range of the controlled variable j=1 ,2,...,Ni 
is valid; designated with Ni is the number of neurons of the layer k-1, w as 
weight, and b as bias (threshold value). Depending upon the application, the 
bias b may be chosen the same or different for all neurons of a certain layer 
As activation function, e.g.. a log-sigmoidal function may be chosen, such as 



1 



1 + e 



-4 



The activation function (or transfer function) is inserted in each 
neuron. Other activation functions such as tangential functions, etc.. are, 
however, likewise possible according to the invention. With the back- 
propagation method, however, it Is to be heeded that a differentiable activation 
function <is used>, such as e.g. a sigmoid function, since this is a prerequisite 
for the method. That is. therefore, binary activation function as e.g. 



lif x>0 
Oif jc<0 



do not work for the back-propagation method. In the neurons of the 
output layer, the outputs of the last hidden layer are summed up in a weighted 
way. The activation function of the output layer may also be linear. The 
entirety of the weightings W^^ and bias jB*^ combined in the parameter- and/or 
weighting matrices determine the behavior of the neural network structure 
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Thus the result is 

o''=B'' + w'^- (i + e-'*'"'*"'*" ">)r' 

The way in which the network is supposed to map an input signal 
onto an output signal, i.e.. the detemiination of the desired weights and bias of 
the network, is achieved by training the network by means of training patterns. 
The set of training patterns (index p) consists of the input signal 

and an output signal 

U''= ,...,< J 

In this embodiment example with the experience rating of claims, the 
training patterns comprise the known events Pi.f with the known development 
values Pikf for all k, f, and 1. Here the development values of the events to be 
extrapolated may naturally not be used for training the neural networks since 
the output value corresponding to them is lacking. 

At the start of the learning operation, the initialization of the weights 
of the hidden layers, thus in this exemplary example of the neurons, is carried 
out, e.g., by means of a log-sigmoidal activation function, e.g. according to 
Nguyen-Wldrow (D. Nguyen, B. Widrow, "Improving the Learning Speed of 2- 
Layer Neural Networks by Choosing Initial Values of Adaptive Weights," 
InternationalJoint Conference of Neural Networks, Vol. 3, pp. 21-26, July 
1990). If a linear activation function has been chosen for the neurons of the 
output layer, the weights may be initialized, e.g., by means of a symmetrical 
random number generator. For training the network, various prior art learning 
methods may be used, such as e.g. the back-propagation method, learning 
vector quantization, radial basis function, Hopfield algorithm, or Kohonen 
algorithm, etc. The task of the training method consists in determining the 
synapses weights wg and bias bij within the weighting matrix W and/or the bias 
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matrix B in such a way that the input patterns are mapped onto the 
corresponding output patterns U". Forjudging the learning stage, the absolute 
quadratic error 

may be used, for example. The error Err then takes into 
consideration all patterns Pjkf of the training basis in which the actual output 
signals i/,^ show the target reactions U^,„ specified in the training basis. For 
this embodiment example, the back-propagation method shall be chosen as the 
learning method. The back-propagation method is a recursive method for 
optimizing the weight factors wy. in each learning step, an input pattern Y" is 
randomly chosen and propagated through the network (fonward propagation). 
By means of the above-described error function Enr, the error Err^ on the 
presented input pattern is determined from the output signal generated by the 
network by means of the target reaction U!^„„ specified in the training basis. 
15 The modifications of the individual weights Wy after the presentation of the |j-th 
training pattern are thereby proportional to the negative partial derivation of the 
error Err^ according to the weight Wy (so-called gradient descent method) 



10 



With the aid of the chain rule, the known adaptation specifications, 
20 known as back-propagation rule, for the elements of the weighting matrix in the 
presentation of the p-th training pattern can be derived from the partial 
derivation. 

with 

25 <^=/'(^r)««.,-<ir4) 



for the output layer, and 
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k 

for the hidden layers, respectively. Here the error is propagated 
through the network in the opposite direction (back propagation) beginning with 

5 the output layer and divided among the individual neurons according to the 
costs-by-cause principle. The proportionality factor s is called the learning 
factor. During the training phase, a limited number of training patterns is 
presented to a neural network, which patterns characterize precisely enough 
the map to be learned. In this embodiment example, with the experience rating 

10 of damage events, the training patterns may comprise all known events Pi,f with 
the known development values Pikf for all k, f. and i. But a selection of the 
known events Pi.f is also conceivable. If thereafter the network is presented 
with an input signal which does not agree exactly with the patterns of the 
training basis, the network interpolates or extrapolates between the training 

15 patterns within the scope of the learned mapping function. This property is 
called the generalization capability of the networks. It is characteristic of neural 
networks that neural networks possess good error tolerance. This is a further 
advantage as compared with the prior art systems. Since neural networks map 
a plurality of (partially redundant) input signals upon the desired output 

20 signal(s), the networks prove to be robust toward the failure of individual input 
signals and/or toward signal noise. A further interesting property of neural 
networks is their adaptive capability. Hence it is possible in principle to have a 
once-trained system relearn or adapt permanently/periodically during operation, 
which is likewise an advantage as compared with the prior art systems. For the 

25 learning method, other methods may naturally also be used, such as e.g. a 
method according to Levenberg-Marquardt (D. Marquardt, "An Algorithm for 
least square estimation of non-linear Parameters," J.SocJnd.Appl.Math., 
pp.431-441, 1963, as well as M.T. Hagan. M.B.. Menjaj, "Training Feed- 
foHA/ard Networks with the Marquardt Algorithm," IEEE-Transactions on Neural 

30 Networks, Vol. 5, No. 6, pp.989-993, November 1994). The Levenberg- 
Marquardt method is a combination of the gradient method and the Newton 
method, and has the advantage that it converges faster than the above- 
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mentioned back-propagation method, but needs a greater storage capacity 
during the training phase. 

In the embodiment example, for determining the development values 
Pi.K-(i-jHi.f for each initial year i (i-1) neural networks Nij are generated iteratively. 

5 j indicates, for a certain initial year i. the number of iterations, with j=1 (i-1). 

Thereby, for the i-st initial year i-1, neural networks Njj are generated. The 
neural network Nij+i depends recursively here from the neural network Nij. For 
weighting, i.e., for training, a certain neural network Nij, e.g., all development 
values Pp.q.f with p=1,...,(i-1) and q=1 ,...,K-(i-j) of the events or losses Ppq may 

10 be used. A limited selection may also be useful, however, depending upon the 
application. The data of the events Ppq may, for instance, as mentioned be 
read out of a database and presented to the system via a data processing unit. 
A calculated development value Pi.m may, e.g., be assigned to the respective 
event Pi.f of an initial year i and itself be presented to the system for determining 

15 the next development value (e.g., Pi.k+i.f) (Figures 1 to 6), or the assignment 
takes place only after the end of the determination of all development values P 
sought (Figure 7). 

In the first case (Figures 1 to 6), as described, development values 
PiXf with development year k=1 .....K are assigned to a certain event Pi.f of an 

20 initial year i. whereby for the initial years i = 1 ,....K, and K are the last known 
development year. For the first initial year i=1 . all development values Pi.k.t are 

known. For each initial year i=2 K by means of iterations j=1 ,...,(i-1). upon 

each iteration j, in a first step, a neural network N|j is generated with an input 
layer with K-(i,j) input segments and an output layer. Each input segment 

25 comprises at least one input neuron and/or at least as many input neurons to 
obtain the input signal for a development value Pj.k.f. The neural networks are 
automatically generated by the system, and may be implemented by means of 
hardware or software. In a second step, the neural network Nij with the 
available events Ej.f of all initial years m=1 (i-1) are weighted by means of the 

30 development values Pm.i. .mhm as input and Pm,i...K.(H)+i.f as output. In a third 
step, by means of the neural network Nij, the output values Oi.f are determined 
for all events Pi.f of the initial year i, the output value Oi,f being assigned to the 
development value Pj.K-(H)+i.f of the event Pj.f, and the neural network Nij 
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depending recursively on the neural network Ni,j+i. Figure 1 shows the training 
and/or presentation phase of a neural network for determining the event value 
Pa.s.f of an event Pf in an upper 5x5 matrix, i.e., at K+5. The dashed line T 
indicates the training phase, and the solid line R indicates the determination 

5 phase after learning. Figure 2 shows the same thing for the third initial year for 
determining P3.4.f (B34). and Figure 3 for determining Pa.s.f- Figure 4 shows only 
the training phase for detemnining P3.4,f and Pa.s.f, the generated values P3,4.f 
(B34) being used for training the network for determining Ps.s.f. Ay indicates the 
known values in the figures, while By displays certain values by means of the 

10 networks. Figure 5 shows the recursive generation of the neural networks for 
determining the values in line 3 of a 5x5 matrix, i-1 networks being generated, 
thus two. Figure 6, on the other hand, shows the recursive generation of the 
neural networks for determining the values in line 3 of a 5x5 matrix, i-1 
networks again being generated, thus four. 

15 it is important to point out that, as an embodiment example, the 

assignment of the event values By generated by means of the system may also 
take place only after determination of all sought development values P. The 
newly determined values are then not available as input values for 
determination of further event values. Figure 7 shows such a method, the 

20 training basis being limited to the known event values Aj. In other words, the 
neural networks may be identical for the same j, the neural network Ni+i,j-i 
being generated for an initial time interval i+1, and all other neural networks 
Ni+i.j<i con-esponding to networks of earlier initial time intervals. This means 
that a network, which was once generated for calculation of a particular event 

25 value Py, is further used for all event values with an initial year a>i for the values 
Py with same j. 

In the case of the insurance cases discussed here, different neural 
networks may be trained, e.g. based on different data. For example, the 
networks may be trained based on the paid claims, based on the incurred 
30 claims, based on the paid and still outstanding claims (reserves) and/or based 
on the paid and incun-ed claims. The best neural network for each case may 
be determined e.g. by means of minimizing the absolute mean error of the 
predicted values and the actual values. For example, the ratio of the mean 
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error to the mean predicted value (of the known claims) may be applied to the 
predicted values of the modeled values in order to obtain the error For the 
case where the predicted values of the previous initial years is <sic. are> co- 
used for calculation of the following initial years, the error must of course be 
5 correspondingly cumulated. This can be achieved e.g. in that the square root 
of the sum of the squares of the individual errors of each model is used. 

To obtain a further estimate of the quality and/or training state of the 
neural networks, e.g. the predicted values can also be fitted by means of the 
mentioned Pareto distribution. This estimation can also be used to determine 
10 e.g. the best neural network from among neural networks (e.g. paid claims, 
outstanding claims, etc.) trained with different sets of data (as described in the 
last paragraph). It thereby follows with the Pareto distribution 



with 

15 7'(O = 77i((l-P(/)y-"«0 

whereby a of the fit parameters, Th of the threshold parameters 
(threshold value), T(i) of the theoretical value of the i-th payment demand, 0(i) 
of the observed value of the i-th payment demand, E(i) is the error of the i-th 
payment demand and P(i) is the cumulated probability of the i-th payment 
20 demand with 



and 



/>(/ + 1) = />(/)+ - 

n 



and n the number of payment demands. For the embodiment 
25 example here, the error of the systems based on the proposed neural networks 
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was compared with the chain ladder method with reference to vehicle insurance 
data. The networks were compared once with the paid claims and once with 
the incurred claims. In order to compare the data, the individual values were 
cumulated in the development years. The direct comparison showed the 
5 following results for the selected example data per 1 000 





Svstenn Based on Neural Networks 


Chain Ladder Method 


Initial 
Year 


Paid Claims 
(cumulated values) 


Incurred Claims 
(cumulated values) 


Paid Claims 
(cumulated values) 


Incurred Claims 
(cumulated values) 


1996 


369.795 ± 5.333 


371.551 ±6.929 


387.796 ± n/a 


389.512 ± n/a 


1997 


769.711 ±6.562 


789.997 ± 8.430 


812.304 ±0.313 


853.017 ± 15.704 


1998 


953.353 140.505 


953.353 ± 30.977 


1099.710 ±6.522 


1042.908 ±32.551 


1999 


1 142,874 ± 84.947 


1440.038 ±47.390 


1052.683 ± 138.221 


1385.249 ±74.813 


2000 


864.628 ± 99.970 


1390.540 ±73.507 


1129.850 ±261.254 


1285.956 ± 112.668 


2001 


213.330 ±72.382 


288.890 ±80.617 


600.419 ±407.718 


1148.565 ±439.1 12 



The error shown here corresponds to the standard deviation, i.e. the 



ai-error, for the indicated values. In particular for later initial years, i.e. initial 
years with greater i, the system based on neural networks shows a clear 

10 advantage in the determination of values compared to the prior art methods in 
that the errors remain substantially stable. This is not the case in the state of 
the art since the error there does not Increase proportionally for increasing i. 
For greater initial years i, a clear deviation in the amount of the cumulated 
values is demonstrated between the chain ladder values and those which were 

15 obtained with the method according to the invention. This deviation is based on 
the fact that in the chain ladder method the IBNYR (Incurred But Not Yet 
Reported) losses have been additionally taken Into account. The IBNYR 
damage events would have to be added to the above-shown values of the 
method according to the invention. For example, for calculation of the portfolio 

20 reserves, the IBNYR damage events can be taken into account by means of a 
separate development (e.g. chain ladder). In reserving for Individual losses or 
in determining loss amount distributions, the IBNYR damage events play no 
role, however. 



